Nearest-neighbour Searching in Files of Text Signatures Using Transputer Networks
نویسندگان
چکیده
This paper discusses the implementation of nearest-neighbour document retrieval in serial files using transputer networks. The system uses a two-stage retrieval algorithm in which an initial text-signature search is used to exclude large numbers of documents from the detailed and time-consuming pattern-matching search. The latter is implemented using a processor farm, so that documents which match at the signature level can be examined in parallel to determine whether they are, in fact, a good match for the query. The results demonstrate that communication is the critical factor in all of the transputer networks that were investigated. A high degree of speed-up can be obtained when only the pattern-matching search is carried out. When text signatures are used, however, the speed-up is less, decreasing in line with an increase in the size of the text signatures that are used.
منابع مشابه
Paragraph-based nearest neighbour searching in full-text documents
This paper discusses the searching of full-text documents to identify paragraphs that are relevant to a user request. Given a natural language query statement, a nearest neighbour search involves ranking the paragraphs comprising a full-text document in order of descending similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملAugmenting Approximate Similarity Searching with Lexical Information
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naı̈ve nearest-neighbour approach to compare context vectors extracted from large corpora scales poorly. The Spatial Approximation Sample Hierarchy (SASH) is a data-structure for performing approximate nearest-neighbour queries, and has been previou...
متن کاملEditorial: special issue on information retrieval
Information is continuing to grow exponentially and the increasing utilization of electronic and optical publishing technologies is making available large machine-readable document collections. There is a strong need for sophisticated and innovative retrieval systems which can provide satisfactory access to such amounts of stored information. Information Retrieval (IR) has been developing from ...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Electronic Publishing
دوره 4 شماره
صفحات -
تاریخ انتشار 1991